High tide in Venice
You are asked to analyze a dataset concerning the high tides in Venice. These data are used by the “Centro Previsioni e Segnalazioni Maree” to produce forecasts of the high tide. The current prediction is reported below (in Italian):
General documentation
The analysis of high tides in Venice has a very long history, and this complex procedure is handled by the “Centro Previsioni e Segnalazioni Maree”, located in Palazzo Cavalli. Their website contains interesting material you are encouraged to read.
If you are unfamiliar with this phenomenon, you may want to read this short booklet (ITA and ENG) to get an overview. This additional booklet (ITA) is also quite informative.
Dataset description
The data can be downloaded here (venice_meteo.zip). Inside the zip folder, you will find several datasets, each corresponding to a different meteorological station, which are:
- Palazzo Cavalli (ITA) (
Stazione_PalazzoCavalli.csv) - Burano (ITA) (
Stazione_Burano.csv) - ISMAR-CRN platform (ITA) (
Stazione_Piattaforma.csvandStazione_Piattaforma_pg.csv) - Isola di San Giorgio (ITA) (
Stazione_SanGiorgio.csv) - Misericordia (ITA) (
Stazione_Misericordia.csv) - Punta Salute (ITA) (
Stazione_PuntaSalute_CanalGrande.csv) - Lido faro (ITA) (
Stazione_DigaSudLido.csv)
Each station tracks different kinds of information, depending on the sensors that have been installed. A detailed documentation (ITA) of these datasets is made available by the Centro Maree for each of the above locations. Please refer to the above link for the variable description.
There are also three additional files:
- Meteorological data (
Dati_Meteo.csv) - Astronomical tide (
astronomical_tide_2022.jsonandastronomical_tide_2023.json)
There is a big overlap between the station data and the meteorological data (ITA and ENG). The astronomical data are obtained from a mathematical model, as described here (ITA and ENG).
Homework rules
- You will work in groups.
- You need to submit the following files:
- A Python notebook (
tidy_homework_group_name.ipynb) that uses as input the (venice_meteo.zip) files and whose output is a clean dataset; - A comma-separated file which contains the tidy dataset (
tidy_dataset_group_name.csv) produced by the above Python notebook.
- A Python notebook (
You are asked to create a single dataset in which the rows are hourly observations ranging from November 15th, 2022, to January 31st, 2023, combining the information that has been made available. At the very least, you should:
- Understand how to import
.jsonfiles usingpandas; - Merge all the above datasets;
- Delete the redundant variables and translate their names into English;
- Exclude from the analysis observations not belonging to 2022-11-15 to 2023-01-31;
- Aggregate some variables, so that observations are recorded hourly rather than every 5 minutes. You will need to think about an appropriate aggregation method for each variable (
sum,mean,max,min, andmedianare possible candidates); - Export the tidy dataset into a file
tidy_dataset_group_name.csv.
Any additional fixes and improvements you wish to perform are welcomed, as long as they increase the “tidiness” of the final dataset tidy_dataset_group_name.csv.